CUSTOMER SEGMENTATION USING MACHINE LEARNING

                    (adopted from kaggle)

Appendix-1

Importing necessary libraries

Reading the data

Exploring the data

Performing Exploratory Data Analysis

We can observe that there is mostly no outliers.

It can be observed that data of Age coulumn and Annual Income is sckewed but of low mangnitute

hence no processing for removal of skewness is done

Data Cleaning

No null value in the data so no need for cleaning

K-Means Clustering

Since K-Means is a distance based algorithim, performing Scalling

As a per part of hyperparameter tunng and to dicide the optimum Nos of clusters Elbow method is adopted

The best cluser may be 6 or 5 as per the elbow method. The best way to generate the model for cluster No 5 and Cluster No 6 and

Points to remember while calculating silhouette coefficient: The value of the silhouette coefficient is between [-1, 1]. A score of 1 denotes the best meaning that the data point i is very compact within the cluster to which it belongs and far away from the other clusters. The worst value is -1. Values near 0 denote overlapping clusters.

Generating model for Cluster No 6

Generating model for Cluster No 5

silhouette_score for cluster No 6 is .42 where as the same is .41 for cluster No 5 so Cluster No 6 is taken for further analysis

Check for Cluster Magnitude